Contents

1 dominatR: A Package for Normalization of RNA-seq and Gene Expression Data

Introduction

The dominatR package provides a flexible suite of normalization methods for transcriptomics data, including CPM, TPM, RPKM, min-max scaling, and quantile normalization. It is compatible with common data structures such as matrix, data.frame, and SummarizedExperiment, and is designed to streamline preprocessing in bioinformatics workflows.

2 Installation

You can install the development version of ‘dominatR’ from GitHub:

# From GitHub (development version)
if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")
remotes::install_github("EthanCHEN6/dominatR_testing", force = TRUE)

3 Data: Airway Gene Expression

This dataset contains RNA-Seq counts for airway smooth muscle cells, commonly used for gene expression analysis.

3.1 Load Data

# Load required packages
library(dominatR)
library(SummarizedExperiment)

# Load airway dataset
library(airway)
data(airway)
airway_se <- airway

3.2 Data Description

The airway dataset is structured as a SummarizedExperiment object:

  • Rows represent genes.
  • Columns represent samples.
  • Assay contains raw count data.

Inspect the dataset:

airway_se
#> class: RangedSummarizedExperiment 
#> dim: 63677 8 
#> metadata(1): ''
#> assays(1): counts
#> rownames(63677): ENSG00000000003 ENSG00000000005 ... ENSG00000273492
#>   ENSG00000273493
#> rowData names(10): gene_id gene_name ... seq_coord_system symbol
#> colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
#> colData names(9): SampleName cell ... Sample BioSample
assayNames(airway_se)
#> [1] "counts"
dim(assay(airway_se))
#> [1] 63677     8
head(assay(airway_se))
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003        679        448        873        408       1138
#> ENSG00000000005          0          0          0          0          0
#> ENSG00000000419        467        515        621        365        587
#> ENSG00000000457        260        211        263        164        245
#> ENSG00000000460         60         55         40         35         78
#> ENSG00000000938          0          0          2          0          1
#>                 SRR1039517 SRR1039520 SRR1039521
#> ENSG00000000003       1047        770        572
#> ENSG00000000005          0          0          0
#> ENSG00000000419        799        417        508
#> ENSG00000000457        331        233        229
#> ENSG00000000460         63         76         60
#> ENSG00000000938          0          0          0

It contains raw read counts for 63,677 genes across 8 samples.

4 Normalization Methods

Normalization is critical for correcting technical biases and enabling meaningful biological comparisons.

The package contains different normalization methods:

Let’s explore the usage of each normalization method on the count data set previously described.

4.1 Min-Max Normalization

Min-Max normalization is a linear transformation technique that rescales each gene’s expression values to a specified range (typically [0, 1]). This normalization method is useful when you want to bring the data onto the same scale.

Function Purpose:

· Rescales each column to fit within a range [new_min, new_max].

· Preserves the relative structure of values within each column.

· Useful when different assays or samples have varying scales.

4.1.1 Example 1: Normalize a matrix

# Prepare input matrix
count_mat <- assay(airway)

# Apply min-max normalization
airway_minmax <- minmax_normalization(count_mat, new_min = 0, new_max = 1)

# Inspect structure
dim(airway_minmax)
#> [1] 63677     8
summary(as.vector(airway_minmax))
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0000000 0.0000000 0.0000000 0.0009679 0.0000274 1.0000000
head(airway_minmax[, 1:5])
#>                   SRR1039508   SRR1039509   SRR1039512   SRR1039513
#> ENSG00000000003 0.0022792424 0.0017523136 1.699217e-03 0.0014897144
#> ENSG00000000005 0.0000000000 0.0000000000 0.000000e+00 0.0000000000
#> ENSG00000000419 0.0015676086 0.0020143784 1.208721e-03 0.0013327102
#> ENSG00000000457 0.0008727585 0.0008253084 5.119062e-04 0.0005988068
#> ENSG00000000460 0.0002014058 0.0002151278 7.785646e-05 0.0001277941
#> ENSG00000000938 0.0000000000 0.0000000000 3.892823e-06 0.0000000000
#>                   SRR1039516
#> ENSG00000000003 2.860799e-03
#> ENSG00000000005 0.000000e+00
#> ENSG00000000419 1.475649e-03
#> ENSG00000000457 6.159013e-04
#> ENSG00000000460 1.960829e-04
#> ENSG00000000938 2.513883e-06

Why Use Custom Ranges? You can set new_min = 10 and new_max = 20 if your downstream application prefers values in a different scale:

df_scaled <- minmax_normalization(count_mat, new_min = 10, new_max = 20)
head(df_scaled)  # All columns now range from 10 to 20
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003   10.02279   10.01752   10.01699   10.01490   10.02861
#> ENSG00000000005   10.00000   10.00000   10.00000   10.00000   10.00000
#> ENSG00000000419   10.01568   10.02014   10.01209   10.01333   10.01476
#> ENSG00000000457   10.00873   10.00825   10.00512   10.00599   10.00616
#> ENSG00000000460   10.00201   10.00215   10.00078   10.00128   10.00196
#> ENSG00000000938   10.00000   10.00000   10.00004   10.00000   10.00003
#>                 SRR1039517 SRR1039520 SRR1039521
#> ENSG00000000003   10.02607   10.02033   10.01536
#> ENSG00000000005   10.00000   10.00000   10.00000
#> ENSG00000000419   10.01990   10.01101   10.01364
#> ENSG00000000457   10.00824   10.00615   10.00615
#> ENSG00000000460   10.00157   10.00201   10.00161
#> ENSG00000000938   10.00000   10.00000   10.00000

4.1.2 Example 2: Normalize a SummarizedExperiment

se <- airway

# Option A: Overwrite the default assay
se1 <- minmax_normalization(se)
head(assay(se1))
#>                   SRR1039508   SRR1039509   SRR1039512   SRR1039513
#> ENSG00000000003 0.0022792424 0.0017523136 1.699217e-03 0.0014897144
#> ENSG00000000005 0.0000000000 0.0000000000 0.000000e+00 0.0000000000
#> ENSG00000000419 0.0015676086 0.0020143784 1.208721e-03 0.0013327102
#> ENSG00000000457 0.0008727585 0.0008253084 5.119062e-04 0.0005988068
#> ENSG00000000460 0.0002014058 0.0002151278 7.785646e-05 0.0001277941
#> ENSG00000000938 0.0000000000 0.0000000000 3.892823e-06 0.0000000000
#>                   SRR1039516   SRR1039517   SRR1039520   SRR1039521
#> ENSG00000000003 2.860799e-03 0.0026074678 0.0020325525 0.0015356158
#> ENSG00000000005 0.000000e+00 0.0000000000 0.0000000000 0.0000000000
#> ENSG00000000419 1.475649e-03 0.0019898441 0.0011007460 0.0013637987
#> ENSG00000000457 6.159013e-04 0.0008243284 0.0006150451 0.0006147833
#> ENSG00000000460 1.960829e-04 0.0001568963 0.0002006156 0.0001610786
#> ENSG00000000938 2.513883e-06 0.0000000000 0.0000000000 0.0000000000

# Option B: Write to a new assay slot
se2 <- minmax_normalization(se, new_assay_name = "minmax_counts")

Example Output: For example, suppose column SRR1039508 originally contains gene expression values between 233 and 12890. After min-max normalization with new_min = 0, new_max = 1:

233 → mapped to 0.0

12890 → mapped to 1.0

For a gene in SRR1039508: - The original expression value is 679. - After Min-Max normalization, the expression level is rescaled to 0.0022 on a scale of [0, 1]. This implies that 679 is very close to the lower end of the range of expression values for this gene, meaning it’s likely among the least expressed in the dataset for this sample.

4.2 Quantile Normalization

Quantile normalization makes the distribution of values across all samples identical. This technique adjusts the data so that the rank distributions of the data across samples are equal.

## Apply quantile normalization
airway_quantile <- quantile_normalization(airway_se)

## Check result
dim(assay(airway_quantile))
#> [1] 63677     8
summary(as.vector(assay(airway_quantile)))
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#>      0.0      0.0      0.0    344.4      9.6 361483.1
head(assay(airway_quantile)[1:5, 1:5])
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003    690.875    504.750    773.875     613.75   1010.000
#> ENSG00000000005      0.000      0.000      0.000       0.00      0.000
#> ENSG00000000419    468.875    582.375    550.625     552.00    516.875
#> ENSG00000000457    257.375    241.375    225.250     254.00    213.125
#> ENSG00000000460     58.000     65.250     31.500      53.75     67.750

Gene expression levels were converted to counts per million reads and log2 transformed.

Example Output: For the first gene in SRR1039508, the normalized value is 690.875, which places it at a higher rank relative to other samples. This suggests that after normalization, this gene shows a higher expression across all samples.

4.3 CPM Normalization

This vignette demonstrates how to apply Counts Per Million (CPM) normalization using the cpm_normalization() function in the dominatR package. It supports matrix, data.frame, and SummarizedExperiment formats.

Function Purpose:

The cpm_normalization() function rescales raw count data such that each column sums to one million, optionally followed by a log2 transformation. This makes count data comparable across samples of different sequencing depths.

4.3.1 Example 1: Normalize a data.frame

df <- assay(airway)
# Normalize without log2-transform
df_cpm <- cpm_normalization(df, log_trans = FALSE)
head(df_cpm[, 1:5])
#>                 SRR1039508 SRR1039509  SRR1039512 SRR1039513  SRR1039516
#> ENSG00000000003  32.900521  23.817776 34.43970525  26.906868 46.54699807
#> ENSG00000000005   0.000000   0.000000  0.00000000   0.000000  0.00000000
#> ENSG00000000419  22.628193  27.379809 24.49834703  24.071095 24.00974329
#> ENSG00000000457  12.598138  11.217747 10.37530639  10.815506 10.02110240
#> ENSG00000000460   2.907263   2.924057  1.57799337   2.308187  3.19039178
#> ENSG00000000938   0.000000   0.000000  0.07889967   0.000000  0.04090246

# Normalize with log2-transform
df_cpm_log <- cpm_normalization(df, log_trans = TRUE)
head(df_cpm_log[, 1:5])
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003   5.083236   4.633302  5.1472947   4.802548 5.57128235
#> ENSG00000000005   0.000000   0.000000  0.0000000   0.000000 0.00000000
#> ENSG00000000419   4.562437   4.826793  4.6723318   4.647953 4.64441834
#> ENSG00000000457   3.765337   3.610906  3.5078335   3.562609 3.46219663
#> ENSG00000000460   1.966158   1.972346  1.3662486   1.726041 2.06708514
#> ENSG00000000938   0.000000   0.000000  0.1095607   0.000000 0.05783488

4.3.2 Example 2: Normalize a SummarizedExperiment

library(SummarizedExperiment)

# Apply in-place normalization (overwrite assay)
se1 <- cpm_normalization(airway, log_trans = FALSE)
head(assay(se1))
#>                 SRR1039508 SRR1039509  SRR1039512 SRR1039513  SRR1039516
#> ENSG00000000003  32.900521  23.817776 34.43970525  26.906868 46.54699807
#> ENSG00000000005   0.000000   0.000000  0.00000000   0.000000  0.00000000
#> ENSG00000000419  22.628193  27.379809 24.49834703  24.071095 24.00974329
#> ENSG00000000457  12.598138  11.217747 10.37530639  10.815506 10.02110240
#> ENSG00000000460   2.907263   2.924057  1.57799337   2.308187  3.19039178
#> ENSG00000000938   0.000000   0.000000  0.07889967   0.000000  0.04090246
#>                 SRR1039517 SRR1039520 SRR1039521
#> ENSG00000000003  33.973415  40.259015  27.026857
#> ENSG00000000005   0.000000   0.000000   0.000000
#> ENSG00000000419  25.926226  21.802609  24.002873
#> ENSG00000000457  10.740401  12.182273  10.820193
#> ENSG00000000460   2.044246   3.973617   2.834985
#> ENSG00000000938   0.000000   0.000000   0.000000

# Save to a new assay slot
se2 <- cpm_normalization(airway, log_trans = TRUE, new_assay_name = 
                            "cpm_logged")
head(assay(se2, "cpm_logged"))
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003   5.083236   4.633302  5.1472947   4.802548 5.57128235
#> ENSG00000000005   0.000000   0.000000  0.0000000   0.000000 0.00000000
#> ENSG00000000419   4.562437   4.826793  4.6723318   4.647953 4.64441834
#> ENSG00000000457   3.765337   3.610906  3.5078335   3.562609 3.46219663
#> ENSG00000000460   1.966158   1.972346  1.3662486   1.726041 2.06708514
#> ENSG00000000938   0.000000   0.000000  0.1095607   0.000000 0.05783488
#>                 SRR1039517 SRR1039520 SRR1039521
#> ENSG00000000003   5.128187   5.366637   4.808738
#> ENSG00000000005   0.000000   0.000000   0.000000
#> ENSG00000000419   4.750940   4.511127   4.644022
#> ENSG00000000457   3.553410   3.720527   3.563182
#> ENSG00000000460   1.606085   2.314295   1.939221
#> ENSG00000000938   0.000000   0.000000   0.000000

4.3.3 Example 3: Normalize a custom assay

new_counts <- matrix(sample(1:100000, nrow(airway) * ncol(airway), TRUE),
                    nrow = nrow(airway))
rownames(new_counts) <- rownames(airway)
colnames(new_counts) <- colnames(airway)

assay(airway, "new_raw") <- new_counts

se3 <- cpm_normalization(airway, assay_name = "new_raw", new_assay_name = 
                            "cpm_new_raw")
head(assay(se3, "cpm_new_raw"))
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003  14.788145   3.498854   2.971905   6.333742  14.536293
#> ENSG00000000005  20.119063  27.425499  14.233632  28.947764  15.390852
#> ENSG00000000419   5.239054   9.119207  28.941853   9.729289  18.336495
#> ENSG00000000457  18.404801  17.627199  10.962249  15.116710  17.428310
#> ENSG00000000460  25.706696   5.218742   4.729612  13.200591   0.275027
#> ENSG00000000938  22.719872  21.485240  23.634589   5.353300   1.202028
#>                 SRR1039517 SRR1039520 SRR1039521
#> ENSG00000000003  15.461546   25.35930  10.972204
#> ENSG00000000005  31.251801   18.95559   9.578674
#> ENSG00000000419  18.533049   21.64368  13.116313
#> ENSG00000000457  24.701435   18.33327  18.638030
#> ENSG00000000460  26.392610   11.94689  26.177399
#> ENSG00000000938   4.932203   18.24914   7.054882

The output of cpm_normalization() depends on the input type:

· If you input a matrix or data.frame, it returns a numeric matrix where:

· Each column sums to 1,000,000 (unless you apply log transform).

· Row and column names are preserved.

· If you input a SummarizedExperiment, it returns the same SE object with:

· Either the original assay overwritten, or A new assay added (if new_assay_name is specified).

Example Output: For example, if a gene in SRR1039508 has an original expression value of 679, after CPM normalization, it might be scaled to 5.083236. This scaling reflects that, after adjusting for sequencing depth, the gene’s relative expression is lower when considering the total number of reads across the sample. This normalization ensures that the gene expression values are comparable across samples with different sequencing depths.

4.4 RPKM Normalization

Reads per kilobase per million (RPKM) normalization adjusts for both gene length and sequencing depth, making it particularly useful for RNA-Seq data. RPKM helps compare gene expression levels across genes of different lengths.

## Calculate gene length
rowData(airway_se)$gene_length <- rowData(airway_se)$gene_seq_end - 
    rowData(airway_se)$gene_seq_start

## Apply RPKM normalization
airway_se_rpkm <- rpkm_normalization(airway_se, gene_length, log_trans = TRUE)

## Check the result
dim(assay(airway_se_rpkm))  # Check the dimensions
#> [1] 63677     8
summary(as.vector(assay(airway_se_rpkm)))  # Summary statistics for all values
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.0000  0.0000  0.0000  0.3134  0.1280 13.3663
head(assay(airway_se_rpkm)[1:5, 1:5])
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003 1.96574725 1.63406253 2.01510789 1.75562333 2.35376434
#> ENSG00000000005 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> ENSG00000000419 0.96736029 1.10825777 1.02446804 1.01161910 1.00976461
#> ENSG00000000457 0.35866816 0.32344632 0.30152042 0.31301889 0.29220123
#> ENSG00000000460 0.02168423 0.02180855 0.01181011 0.01724252 0.02377868

Example Output: For example, if a gene in SRR1039508 has an original expression value of 679, after RPKM normalization, it might be scaled to 1.96574725. This scaling reflects that, after adjusting for both gene length and sequencing depth, the gene’s relative expression is normalized, making it comparable across genes of different lengths.

4.5 TPM Normalization

Transcripts per million normalization.

## Calculate gene_length if not provided
rowData(airway_se)$gene_length <- rowData(airway_se)$gene_seq_end - 
    rowData(airway_se)$gene_seq_start

## Apply TPM normalization
airway_tpm <- tpm_normalization(airway_se, log_trans = TRUE)

## Check result
dim(assay(airway_tpm))
#> [1] 63677     8
summary(as.vector(assay(airway_tpm)))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.0000  0.0000  0.0000  0.7951  0.7534 16.1408
head(assay(airway_tpm)[1:5, 1:5])
#>                 SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516
#> ENSG00000000003  4.4294986  3.9577411 4.55434051  4.2249660  4.8647104
#> ENSG00000000005  0.0000000  0.0000000 0.00000000  0.0000000  0.0000000
#> ENSG00000000419  2.9549906  3.1678707 3.11233806  3.0989031  2.9884095
#> ENSG00000000457  1.5828547  1.4524115 1.44301228  1.4877425  1.3427320
#> ENSG00000000460  0.1467549  0.1443756 0.08513068  0.1237197  0.1553897

Example Output:x For example, in SRR1039508, if the gene expression for a particular gene is normalized to 4.43 TPM, this indicates that the gene’s expression represents 4.43 transcripts per million, accounting for both gene length and sequencing depth.

5 Visualization Functions

Visual representation is essential for interpreting the structure, dominance, and variability of biological features across samples or conditions.

Our package offers a collection of entropy-based visualization functions designed for different analytical perspectives:

Let’s now explore each visualization function with real data examples.

5.1 plot_circle: Entropy-Magnitude Circle Plot

This function visualizes high-dimensional input (e.g., gene expression matrix)

Using a polar coordinate system where: - Radial position represents Shannon entropy (distribution uniformity)

  • Angular position represents the dominant variable (feature with maximum value)

  • Point color represents either the dominant variable or an optional factor

5.1.1 Description:

This function is ideal for:

  • Visualizing multidimensional datasets (samples × features) in an interpretable 2D circular space.

  • Detecting samples/features with high entropy (irregularity) or high average expression.

  • Identifying mixed-behavior regions such as dense clusters or entropy-magnitude outliers.

  • Facilitating compact visualization across thousands of rows or columns.

5.1.2 Using a SummarizedExperiment input

Visualize genes from the airway dataset with specific filtering criteria. Each point represents a gene, colored by the sample in which it has the highest expression (dominant sample). We’ll highlight a specific gene of interest.

rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start
se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = "tpm_norm")
se <- se[1:1000, ]
#' Rename columns for consistency
colnames(se) <- paste('Column_', 1:8, sep ='')

5.1.2.1 Example 1: Default parameters with SE

plot_circle(
    x = se,
    n = 8,
    entropyrange     = c(0, 3),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    assay_name = 'tpm_norm'
)

5.1.2.2 Example 2: Low-entropy filtering (0-1.5)

plot_circle(
    x = se,
    n = 8,
    entropyrange     = c(0, 1.5),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    assay_name = 'tpm_norm'
)

5.1.2.3 Example 3: High-entropy filtering (2-3)

plot_circle(
    x = se,
    n = 8,
    entropyrange     = c(2, 3),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    assay_name = 'tpm_norm'
)

5.1.2.4 Example 4: Grouping by gene biotype

plot_circle(
    x = se,
    n = 8,
    column_variable_factor = 'gene_biotype',
    entropyrange     = c(2,3),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    assay_name = 'tpm_norm'
)

5.1.2.5 Example 5: Custom point size and filtering

plot_circle(
    x = se,
    n = 8,
    column_variable_factor = 'gene_biotype',
    point_size = 3,
    entropyrange     = c(0,1.5),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    assay_name = 'tpm_norm',
)

5.1.2.6 Example 6: Highlighting specific biotypes

# Emphasize miRNA genes in orange
plot_circle(
    x = se,
    n = 8,
    column_variable_factor = 'gene_biotype',
    point_size = 3,
    entropyrange     = c(0,1.5),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    assay_name = 'tpm_norm',
    point_fill_colors = c('protein_coding' = 'orange'),
    point_line_colors = c('protein_coding' = 'orange')
)

5.1.2.7 Example 7: Retrieving plot data from SE


se_result <- plot_circle(
    x = se,
    n = 8,
    column_variable_factor = 'gene_biotype',
    point_size = 3,
    entropyrange     = c(0,1.5),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = TRUE,
    assay_name = 'tpm_norm',
    point_fill_colors = c('protein_coding' = 'orange'),
    point_line_colors = c('protein_coding' = 'orange')
)

The result is a list of two objects:

  • se_result[[1]]: a ggplot2 object for visualization
  • se_result[[2]]: a data.frame with entropy, magnitude, etc.
se_result[[1]]

head(se_result[[2]])
#>                         Factor   Entropy      col       rad        deg
#> ENSG00000000005 protein_coding 0.0000000 Column_8 100.00000 -4.7123890
#> ENSG00000000938 protein_coding 0.9103799 Column_3  87.43514 -0.7853982
#> ENSG00000002726 protein_coding 0.0000000 Column_3 100.00000 -0.7853982
#> ENSG00000004809 protein_coding 0.0000000 Column_1 100.00000  0.7853982
#> ENSG00000004848 protein_coding 0.0000000 Column_7 100.00000 -3.9269908
#> ENSG00000004939 protein_coding 0.0000000 Column_7 100.00000 -3.9269908
#>                             x         y   labels   rand_deg     alpha
#> ENSG00000000005 -1.836970e-14 100.00000 Column_8 -4.7463259 1.0000000
#> ENSG00000000938  6.182598e+01 -61.82598 Column_3 -0.8193351 0.8862025
#> ENSG00000002726  7.071068e+01 -70.71068 Column_3 -0.8096388 1.0000000
#> ENSG00000004809  7.071068e+01  70.71068 Column_1  0.7611575 1.0000000
#> ENSG00000004848 -7.071068e+01  70.71068 Column_7 -3.8833576 1.0000000
#> ENSG00000004939 -7.071068e+01  70.71068 Column_7 -3.8833576 1.0000000

5.1.3 Using a matrix or data.frame input

You can also use a raw matrix or data frame, such as one extracted from the assay slot of a SummarizedExperiment object:

#' First we extract the normalized data as a data.frame:

df <- assay(se, 'tpm_norm') |> as.data.frame()
colnames(df) <- paste('Column_', 1:8, sep ='')

5.1.3.1 Example 1: Default parameters

plot_circle(
    x = df,
    n = 8,
    entropyrange     = c(0, 3),
    magnituderange   = c(0, Inf),
    label  = 'legend', 
    output_table = FALSE
)

5.1.3.2 Example 2: Filtering high-entropy genes

# Genes with entropy between 2-3 (more balanced expression)
plot_circle(
    x = df,
    n = 8,
    entropyrange     = c(2, 3),
    magnituderange   = c(0, Inf),
    label  = 'legend', 
    output_table = FALSE
)

5.1.3.3 Example 3: Filtering low-entropy genes

#' Genes with entropy between 0-2 (more specialized expression)
plot_circle(
    x = df,
    n = 8,
    entropyrange     = c(0, 2),
    magnituderange   = c(0, Inf),
    label  = 'legend', 
    output_table = FALSE
)

5.1.3.4 Example 4: Curved label aesthetics

plot_circle(
    x = df,
    n = 8,
    entropyrange     = c(0, 2),
    magnituderange   = c(0, Inf),
    label  = 'curve',
    output_table = FALSE
)

5.1.3.5 Example 5: Highlighting specific samples

#' Emphasize expression dominance in Columns 1, 3, and 5
plot_circle(
    x = df,
    n = 8,
    entropyrange     = c(0, 2),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    background_alpha_polygon = 0.2,
    background_na_polygon = 'transparent',
    background_polygon = c('Column_1'  = 'indianred',
                        'Column_3' = 'lightblue',
                        'Column_5' = 'lightgreen'),
    point_fill_colors = c('Column_1'  = 'darkred',
                        'Column_3' = 'darkblue',
                        'Column_5' = 'darkgreen'),
    point_line_colors = c('Column_1'  = 'black',
                        'Column_3' = 'black',
                        'Column_5' = 'black')
)


# 1.2 Using factor variables

#' Add a factor column for grouping
set.seed(123)  # For reproducibility
df$factor <- sample(c('A', 'B', 'C', 'D'), size = nrow(df), replace = TRUE)

5.1.3.6 Example 6: Visualizing by factor groups

plot_circle(
    x = df,
    n = 8,
    column_variable_factor = 'factor',
    entropyrange     = c(0, 2),
    magnituderange   = c(0, Inf),
    label  = 'legend',
    output_table = FALSE,
    background_alpha_polygon = 0.2,
    background_na_polygon = 'transparent',
    background_polygon = c('Column_1'  = 'indianred',
                        'Column_3' = 'lightblue',
                        'Column_5' = 'lightgreen')
)

5.1.3.7 Example 7: Custom factor colors

plot_circle(
    x = df,
    n = 8,
    column_variable_factor = 'factor',
    entropyrange     = c(0, 2),
    magnituderange   = c(0, Inf),
    label  = 'curve',
    output_table = FALSE,
    background_alpha_polygon = 0.02,
    background_na_polygon = 'transparent',
    point_fill_colors = c('A' = 'black',
                        'B' = 'gray',
                        'C' = 'white',
                        'D' = 'orange'),
    point_line_colors = c('A' = 'black',
                        'B' = 'gray',
                        'C' = 'white',
                        'D' = 'orange')
)

5.1.3.8 Example 8: Retrieving plot data

#' When `output_table = TRUE`, returns a list containing:
#' 1. ggplot object
#' 2. Data frame with entropy, magnitude, and dominance information
plot_result <- plot_circle(
    x = df,
    n = 8,
    point_size =  2,
    column_variable_factor = 'factor',
    entropyrange     = c(0, 2),
    magnituderange   = c(0, Inf),
    label  = 'curve',
    output_table = TRUE,
    background_alpha_polygon = 0.02,
    background_na_polygon = 'transparent',
    point_fill_colors = c('A' = 'black',
                        'B' = 'gray',
                        'C' = 'white',
                        'D' = 'orange'),
    point_line_colors = c('A' = 'black',
                        'B' = 'gray',
                        'C' = 'white',
                        'D' = 'orange')
)

# View plot
plot_result[[1]]


# View data
head(plot_result[[2]])
#>                 Factor   Entropy      col       rad        deg             x
#> ENSG00000000005      C 0.0000000 Column_4 100.00000 -1.5707963  6.123234e-15
#> ENSG00000000938      B 0.9103799 Column_3  87.43514 -0.7853982  6.182598e+01
#> ENSG00000002587      A 1.5059369 Column_1  73.71299  0.7853982  5.212296e+01
#> ENSG00000002726      C 0.0000000 Column_3 100.00000 -0.7853982  7.071068e+01
#> ENSG00000003147      A 1.9042011 Column_7  60.81406 -3.9269908 -4.300204e+01
#> ENSG00000004809      D 0.0000000 Column_3 100.00000 -0.7853982  7.071068e+01
#>                          y   labels   rand_deg     alpha
#> ENSG00000000005 -100.00000 Column_4 -1.5465556 1.0000000
#> ENSG00000000938  -61.82598 Column_3 -0.8096388 0.8862025
#> ENSG00000002587   52.12296 Column_1  0.7514612 0.8117579
#> ENSG00000002726  -70.71068 Column_3 -0.7417649 1.0000000
#> ENSG00000003147   43.00204 Column_7 -3.9124464 0.7619749
#> ENSG00000004809  -70.71068 Column_3 -0.8290314 1.0000000

5.1.4 Output interpretation

The returned data frame (res[[2]]) contains the following columns:

  • Entropy: the entropy score computed across rows (for each sample or feature).
  • col: the sample identifier (i.e. the original column name).
  • rad: the magnitude (mean expression) encoded as the radial distance in the plot.
  • deg: the plotting angle (in radians) for each sample’s axis.
  • x, y: the Cartesian coordinates corresponding to (rad, deg), used internally by geom_point().
  • labels: the text labels (e.g. sample names) when label = "legend" or variables_highlight is set.
  • rand_deg: the random rotation offset (fixed if you call set.seed() beforehand).
  • alpha: the point transparency (1 for highlighted points, otherwise equal to your background_alpha_polygon setting).

5.1.5 Biological Interpretation:

Dominant Sample: Shows which sample has the highest expression for each gene Useful for identifying sample-specific expression patterns

Radial Position: Genes near edge: Highly specific to one sample (low entropy) Genes near center: Similar expression across samples (high entropy)

Sector Position: Each wedge represents a sample Genes in a sample’s wedge have their highest expression in that sample

5.2 plot_circle_frequency(): Frequency-Stratified Entropy-Magnitude Visualization

This function builds upon plot_circle() by stratifying samples into frequency bins and visualizing entropy-magnitude patterns for each bin separately. Useful when your dataset contains variables/features with different levels of occurrence or sparsity (e.g., expressed vs. non-expressed genes).

5.2.1 Description

This function is ideal for:

  • Identifying highly prevalent genes/features across a cohort.

  • Screening for outlier or inactive variables.

  • Visually comparing distributions in a compact format.

5.2.2 Using a SummarizedExperiment object

# Data preprocessing
rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start
se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm')
se <- se[1:1000, ]


# Creating the circle plot data

# First we create the circle plot with output_table = TRUE to get 
# the data needed for the frequency plot. We'll use gene biotype as our 
# factor variable.

circle_data <- plot_circle(
    x = se,
    n = 8,
    column_variable_factor = 'gene_biotype',
    entropyrange = c(0, Inf),
    magnituderange = c(0, Inf),
    label = 'legend',
    output_table = TRUE,
    assay_name = 'tpm_norm'
)

5.2.2.1 Example 1: Default parameters (combined panel)

freq_plot_default <- plot_circle_frequency(
    n = 8,
    circle = circle_data,
    single = TRUE,
    legend = TRUE,
    numb_columns = 1,
    filter_class = NULL,
    point_size = 2
)

# Display the plot
freq_plot_default[[1]]


# View aggregated data
head(freq_plot_default[[2]])
#>   bin               Factor n proportion
#> 1   1 processed_transcript 1  0.1428571
#> 2   2 processed_transcript 0  0.0000000
#> 3   3 processed_transcript 0  0.0000000
#> 4   4 processed_transcript 1  0.1428571
#> 5   5 processed_transcript 0  0.0000000
#> 6   6 processed_transcript 0  0.0000000

5.2.2.2 Example 2: Faceted by factor

# Visualize each factor level in separate panels

plot_circle_frequency(
    n = 8,
    circle = circle_data,
    single = FALSE,
    legend = TRUE,
    numb_columns = 3,  # Arrange in 3 columns
    filter_class = NULL,
    point_size = 2
)
#> $plot_stat

#> 
#> $data
#>    bin               Factor   n proportion
#> 1    1 processed_transcript   1 0.14285714
#> 2    2 processed_transcript   0 0.00000000
#> 3    3 processed_transcript   0 0.00000000
#> 4    4 processed_transcript   1 0.14285714
#> 5    5 processed_transcript   0 0.00000000
#> 6    6 processed_transcript   0 0.00000000
#> 7    7 processed_transcript   1 0.14285714
#> 8    8 processed_transcript   4 0.57142857
#> 9    1       protein_coding  71 0.07178969
#> 10   2       protein_coding  13 0.01314459
#> 11   3       protein_coding  13 0.01314459
#> 12   4       protein_coding  18 0.01820020
#> 13   5       protein_coding  28 0.02831143
#> 14   6       protein_coding  14 0.01415571
#> 15   7       protein_coding  37 0.03741153
#> 16   8       protein_coding 795 0.80384226
#> 17   1           pseudogene   1 0.25000000
#> 18   2           pseudogene   1 0.25000000
#> 19   3           pseudogene   0 0.00000000
#> 20   4           pseudogene   0 0.00000000
#> 21   5           pseudogene   2 0.50000000
#> 22   6           pseudogene   0 0.00000000
#> 23   7           pseudogene   0 0.00000000
#> 24   8           pseudogene   0 0.00000000

5.2.2.3 Example 3: Filtering specific classes

# Focus on specific gene biotypes

plot_circle_frequency(
    n = 8,
    circle = circle_data,
    single = FALSE,
    legend = TRUE,
    numb_columns = 1,  # Single column layout
    filter_class = c('protein_coding', 'snoRNA', 'miRNA'),
    point_size = 3  # Larger points for emphasis
)
#> $plot_stat

#> 
#> $data
#>    bin         Factor   n proportion
#> 9    1 protein_coding  71 0.07178969
#> 10   2 protein_coding  13 0.01314459
#> 11   3 protein_coding  13 0.01314459
#> 12   4 protein_coding  18 0.01820020
#> 13   5 protein_coding  28 0.02831143
#> 14   6 protein_coding  14 0.01415571
#> 15   7 protein_coding  37 0.03741153
#> 16   8 protein_coding 795 0.80384226

5.2.2.4 Example 4: Combined plot with custom filtering

# Create a combined plot showing only selected classes

plot_circle_frequency(
    n = 8,
    circle = circle_data,
    single = TRUE,
    legend = TRUE,
    numb_columns = 1,
    filter_class = c('protein_coding', 'miRNA', 'lincRNA'),
    point_size = 3
)
#> $plot_stat

#> 
#> $data
#>    bin         Factor   n proportion
#> 9    1 protein_coding  71 0.07178969
#> 10   2 protein_coding  13 0.01314459
#> 11   3 protein_coding  13 0.01314459
#> 12   4 protein_coding  18 0.01820020
#> 13   5 protein_coding  28 0.02831143
#> 14   6 protein_coding  14 0.01415571
#> 15   7 protein_coding  37 0.03741153
#> 16   8 protein_coding 795 0.80384226

5.2.3 Using a matrix or data.frame input

# Create data.frame version
df <- assay(se, 'tpm_norm') |> as.data.frame()
colnames(df) <- paste('Sample', 1:8, sep = '_')
df$gene_biotype <- rowData(se)$gene_biotype

# Create circle plot data
circle_df <- plot_circle(
    x = df,
    n = 8,
    column_variable_factor = 'gene_biotype',
    entropyrange = c(0, Inf),
    magnituderange = c(0, Inf),
    label = 'legend',
    output_table = TRUE
)

5.2.3.1 Example 5: Data.frame input with faceting

plot_circle_frequency(
    n = 8,
    circle = circle_df,
    single = FALSE,
    legend = TRUE,
    numb_columns = 2,
    filter_class = NULL,
    point_size = 1.5
)
#> $plot_stat

#> 
#> $data
#>    bin               Factor   n proportion
#> 1    1 processed_transcript   1 0.14285714
#> 2    2 processed_transcript   0 0.00000000
#> 3    3 processed_transcript   0 0.00000000
#> 4    4 processed_transcript   1 0.14285714
#> 5    5 processed_transcript   0 0.00000000
#> 6    6 processed_transcript   0 0.00000000
#> 7    7 processed_transcript   1 0.14285714
#> 8    8 processed_transcript   4 0.57142857
#> 9    1       protein_coding  71 0.07178969
#> 10   2       protein_coding  13 0.01314459
#> 11   3       protein_coding  13 0.01314459
#> 12   4       protein_coding  18 0.01820020
#> 13   5       protein_coding  28 0.02831143
#> 14   6       protein_coding  14 0.01415571
#> 15   7       protein_coding  37 0.03741153
#> 16   8       protein_coding 795 0.80384226
#> 17   1           pseudogene   1 0.25000000
#> 18   2           pseudogene   1 0.25000000
#> 19   3           pseudogene   0 0.00000000
#> 20   4           pseudogene   0 0.00000000
#> 21   5           pseudogene   2 0.50000000
#> 22   6           pseudogene   0 0.00000000
#> 23   7           pseudogene   0 0.00000000
#> 24   8           pseudogene   0 0.00000000

5.2.4 Output interpretation

Each arc segment represents:

  • A variable (e.g., gene), sorted by frequency.

  • Arc height indicates proportion of samples above threshold.

  • Useful for ranking and filtering in QC pipelines.

The returned table includes:

  • Variable: variable name (e.g., gene ID)

  • Proportion: % of samples with value above threshold

  • Threshold: cutoff used

  • Rank: position in sorted list

5.3 plot_abacus(): Abacus Plot of Sample/Feature Entropy Profiles

This function creates an abacus plot that classifies observations into dominance groups using entropy-based methods. It visualizes which features (e.g., genes) are dominant in which samples across different expression dominance categories.

5.3.1 Description

This function is ideal for:

  • Classifying features into expression dominance categories (e.g., low, medium, high)

  • Identifying features that consistently dominate across multiple samples

  • Highlighting features of interest in the context of their dominance classification

  • Visualizing the distribution of dominance classes across samples in a compact format


se <- airway[1:1000, ]  
rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start
se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm')

# Prepare data frame
df_abacus <- as.data.frame(assay(se, "tpm_norm"))
df_abacus$gene_id <- rownames(df_abacus)
df_abacus <- df_abacus[, c("gene_id", setdiff(colnames(df_abacus), "gene_id"))]
head(df_abacus[, 1:5])
#>                         gene_id SRR1039508 SRR1039509 SRR1039512 SRR1039513
#> ENSG00000000003 ENSG00000000003  10.020022   9.574117  10.022841   9.817259
#> ENSG00000000005 ENSG00000000005   0.000000   0.000000   0.000000   0.000000
#> ENSG00000000419 ENSG00000000419   8.417713   8.711587   8.468987   8.593571
#> ENSG00000000457 ENSG00000000457   6.668775   6.522540   6.329381   6.537197
#> ENSG00000000460 ENSG00000000460   2.679289   2.702931   1.929143   2.474782
#> ENSG00000000938 ENSG00000000938   0.000000   0.000000   1.111869   0.000000

5.3.2 Using a matrix or data.frame input

# Generate plot with minimal parameters
abacus_res <- plot_abacus(
    data              = df_abacus,               
    n                 = ncol(df_abacus) - 1,     
    x_variable        = "gene_id",               
    y_variables       = colnames(df_abacus)[-1], 
    percentiles       = 4,                       
    title             = "Gene Expression Dominance", 
    point_size        = 2,                      
    single            = TRUE                  
)

abacus_res[[1]]

5.3.3 Output interpretation

The call to plot_abacus(...) returns a list of two elements:

  • res[[1]]
    A ggplot2 object: the abacus‐style dominance plot itself.

  • res[[2]]
    A data.frame with one row per point drawn on the plot, containing:

  • X_axis
    The identifier you passed as x_variable (e.g. gene ID).

  • Variable The name of the variable (column) each point belongs to (one of your y_variables).

  • Qentropy The computed categorical entropy (Qentropy) for that feature–variable combination.

  • bin
    A factor giving the percentile bin (e.g. “0.25”, “0.50”, etc.) into which that Qentropy falls.

5.4 plot_rope(): Rope Plot for Binary Feature Dominance

This function compares two numeric vectors (e.g., expression in Condition A vs. B) using a “rope-like” 1D dominance visualization. Each sample is classified by its relative dominance, optionally filtered by entropy or magnitude thresholds.

5.4.1 Description

This function is ideal for:

  • Comparing two groups of measurements across matched samples or features.

  • Detecting dominance shifts (e.g., gene up/down regulation between two conditions).

  • Filtering samples based on entropy or effect size before plotting.

## Data preparation
se <- airway[1:1000, ]  # Subset for faster computation
rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start
se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm')
df <- as.data.frame(assay(se, 'tpm_norm'))
sample1 <- "SRR1039508"
sample2 <- "SRR1039516"

5.4.2 Using a SummarizedExperiment object

5.4.2.1 Example 1: Low entropy + Medium expression

res_rope = plot_rope(
    x = se,
    column_name = c(sample1, sample2),
    col = c('lightgreen', 'indianred'),
    entropyrange = c(0, 0.1),
    maxvaluerange = c(4, 8),
    title = "SE Input: Low Entropy + Medium Expression"
)

5.4.2.2 Example 2: Medium entropy + Medium expression

res_rope = plot_rope(
    x = se,
    column_name = c(sample1, sample2),
    col = c('lightgreen', 'indianred'),
    entropyrange = c(0.1, 0.8),
    maxvaluerange = c(4, 8),
    title = "SE Input: Medium Entropy + Medium Expression"
)

5.4.2.3 Example 3: High entropy + Medium expression

res_rope = plot_rope(
    x = se,
    column_name = c(sample1, sample2),
    col = c('lightgreen', 'indianred'),
    entropyrange = c(0.8, 1),
    maxvaluerange = c(4, 8),
    title = "SE Input: High Entropy + Medium Expression"
)

5.4.2.4 Example 4: Retrieve output data

res_rope = plot_rope(
    x = se,
    column_name = c(sample1, sample2),
    output_table = TRUE,
    col = c('lightgreen', 'indianred'),
    entropyrange = c(0.8, 1),
    maxvaluerange = c(4, 8)
)


str(res_rope)
#> 'data.frame':    1000 obs. of  7 variables:
#>  $ a       : int  679 0 467 260 60 0 3251 1433 519 394 ...
#>  $ b       : int  1138 0 587 245 78 1 6721 1424 820 658 ...
#>  $ comx    : num  0.2526 0 0.1139 -0.0297 0.1304 ...
#>  $ comy    : num  0.188 -0.053 -0.108 0.047 -0.097 ...
#>  $ color   : chr  "whitesmoke" "whitesmoke" "whitesmoke" "whitesmoke" ...
#>  $ maxvalue: int  1138 0 587 260 78 1 6721 1433 820 658 ...
#>  $ entropy : num  0.953 0 0.991 0.999 0.988 ...
head(res_rope)
#>                   a    b        comx    comy      color maxvalue   entropy
#> ENSG00000000003 679 1138  0.25261420  0.1880 whitesmoke     1138 0.9534655
#> ENSG00000000005   0    0  0.00000000 -0.0530 whitesmoke        0 0.0000000
#> ENSG00000000419 467  587  0.11385199 -0.1085 whitesmoke      587 0.9906294
#> ENSG00000000457 260  245 -0.02970297  0.0470 whitesmoke      260 0.9993635
#> ENSG00000000460  60   78  0.13043478 -0.0970 whitesmoke       78 0.9876925
#> ENSG00000000938   0    1  1.00000000  0.1270 whitesmoke        1 0.0000000

5.4.3 Using a matrix or data.frame input

5.4.3.1 Example 1: Default behavior

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    title = "Default Rope Plot"
)

5.4.3.2 Example 2. Custom colors

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    col = c('darkgreen', 'darkred'),
    title = "Custom Colors"
)

head(res_rope)
#>                         a          b         comx    comy     color   maxvalue
#> ENSG00000000003 10.020022 10.4880532  0.022821808  0.1070   darkred 10.4880532
#> ENSG00000000005  0.000000  0.0000000  0.000000000  0.0025   darkred  0.0000000
#> ENSG00000000419  8.417713  8.4708971  0.003149127  0.1290   darkred  8.4708971
#> ENSG00000000457  6.668775  6.3104879 -0.027604619 -0.1120 darkgreen  6.6687755
#> ENSG00000000460  2.679289  2.7657567  0.015880043  0.1540   darkred  2.7657567
#> ENSG00000000938  0.000000  0.6916003  1.000000000 -0.1340   darkred  0.6916003
#>                   entropy
#> ENSG00000000003 0.9996243
#> ENSG00000000005 0.0000000
#> ENSG00000000419 0.9999928
#> ENSG00000000457 0.9994503
#> ENSG00000000460 0.9998181
#> ENSG00000000938 0.0000000

5.4.3.3 Example 3. Low entropy filtering (0-0.1)

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    col = c('darkgreen', 'darkred'),
    entropyrange = c(0, 0.1),
    title = "Low Entropy Genes (0-0.1)"
)

5.4.3.4 Example 4. Low entropy + High expression

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    col = c('darkgreen', 'darkred'),
    entropyrange = c(0, 0.1),
    maxvaluerange = c(2, Inf),
    title = "Low Entropy + High Expression"
)

5.4.3.5 Example 5. Low Entropy + Medium Expression

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    col = c('darkgreen', 'darkred'),
    entropyrange = c(0, 0.1),
    maxvaluerange = c(4, 8),
    title = "Low Entropy + Medium Expression"
)

head(res_rope[[2]])
#> [1] 10.4880532  0.0000000  8.4708971  6.3104879  2.7657567  0.6916003

5.4.3.6 Example 6. Medium Entropy + Medium Expression

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    col = c('darkgreen', 'darkred'),
    entropyrange = c(0.1, 0.8),
    maxvaluerange = c(4, 8),
    title = "Medium Entropy + Medium Expression"
)

5.4.3.7 Example 7. High Entropy + Medium Expression

res_rope = plot_rope(
    x = df,
    column_name = c(sample1, sample2),
    col = c('darkgreen', 'darkred'),
    entropyrange = c(0.8, 1),
    maxvaluerange = c(4, 8),
    title = "High Entropy + Medium Expression"
)

5.4.4 Output interpretation

The call to plot_rope(...) returns a list of two elements:

  • res[[1]]: a ggplot2 object for the rope‐style dominance plot.

  • res[[2]]: a data.frame with one row per point drawn, containing:

  • a, b
    The original values from each of the two input columns you passed (e.g.  the two TPM values).

  • comx, comy
    The computed Cartesian coordinates for each point on the “rope”.

  • color
    The fill color (as a string) actually used for that point.

  • entropy
    The Shannon entropy score for that feature across all columns.

  • maxvalue
    The mean (or maximum) expression value used to scale point size (or filter).

5.5 plot_triangle(): Ternary Plot for Three-Way Feature Relationships

This function visualizes three-part compositions (e.g., condition A/B/C contributions) on a ternary plot. Useful when analyzing data with three mutually exclusive categories or proportions summing to one.

5.5.1 Description

This function is ideal for:

  • Displaying relationships between three mutually exclusive components.

  • Exploring feature allocation among three sources or pathways (e.g., tissue A/B/C).

  • Identifying samples/features located at edge or center of triangular composition space.

## Minimal data preparation
se <- airway[1:1000, ]  # Subset for faster computation
rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start
se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm')
df <- as.data.frame(assay(se, 'tpm_norm'))
samples <- c("SRR1039508", "SRR1039512", "SRR1039516")

5.5.2 Using a matrix or data.frame input

5.5.2.1 Example 1: Default behavior

res_rope = plot_triangle(
    x = df,
    column_name = samples
)

5.5.2.2 Example 2. Custom Colors

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue')
)

5.5.2.3 Example 3. Low Entropy Genes (0-0.4)

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(0, 0.4)
)

5.5.2.4 Example 4. Medium Entropy Genes (0.4-1.3)

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(0.4, 1.3)
)

5.5.2.5 Example 5. High Entropy Genes (1.3-Inf)

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(1.3, Inf)
)

5.5.2.6 Example 6. High Entropy + Expression range (2-Inf)

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(1.2, Inf),
    maxvaluerange = c(2, Inf)
)

5.5.2.7 Example 7. High Entropy + Higher expression (5-Inf)

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(1.2, Inf),
    maxvaluerange = c(5, Inf)
)

5.5.2.8 Example 8. High Entropy + Higher expression (10-Inf)

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(1.2, Inf),
    maxvaluerange = c(10, Inf)
)

5.5.2.9 Example 9. Remove Background Points

res_rope = plot_triangle(
    x = df,
    column_name = samples,
    col = c('indianred', 'lightgreen', 'lightblue'),
    entropyrange = c(1.2, Inf),
    maxvaluerange = c(2, Inf),
    plotAll = FALSE
)

5.5.3 Using a SummarizedExperiment object

5.5.3.1 Example 1. Low entropy (0-0.4)

res_rope = plot_triangle(
    x = se,
    column_name = samples,
    col = c('darkred', 'darkgreen', 'darkblue'),
    entropyrange = c(0, 0.4),
    maxvaluerange = c(0.1, Inf),
    assay_name = 'tpm_norm'
)

5.5.3.2 Example 2. Medium entropy (0.4-1.3)

res_rope = plot_triangle(
    x = se,
    column_name = samples,
    col = c('darkred', 'darkgreen', 'darkblue'),
    entropyrange = c(0.4, 1.3),
    maxvaluerange = c(0.1, Inf),
    assay_name = 'tpm_norm'
)

5.5.3.3 Example 3. High entropy (1.3-Inf)

res_rope = plot_triangle(
    x = se,
    column_name = samples,
    col = c('darkred', 'darkgreen', 'darkblue'),
    entropyrange = c(1.3, Inf),
    maxvaluerange = c(0.1, Inf),
    assay_name = 'tpm_norm'
)

5.5.3.4 Example 4. Output Data Retrieval

triangle_data <- plot_triangle(
    x = se,
    column_name = samples,
    output_table = TRUE,
    entropyrange = c(1.3, Inf),
    maxvaluerange = c(0.1, Inf),
    assay_name = 'tpm_norm'
)

# View first 6 rows of the output data
head(triangle_data)
#>                 max_counts          comx         comy         a         b
#> ENSG00000000003  10.488053 -1.319600e-02 -0.007711040 0.3281926 0.3282850
#> ENSG00000000005   0.000000  0.000000e+00  0.000000000 0.0000000 0.0000000
#> ENSG00000000419   8.470897 -6.522825e-05 -0.002059715 0.3319602 0.3339822
#> ENSG00000000457   6.668775  8.474045e-04  0.018066565 0.3453777 0.3278004
#> ENSG00000000460   2.765757 -9.825204e-02  0.045000137 0.3633334 0.2616074
#> ENSG00000000938   1.111869  2.018128e-01 -0.500000000 0.0000000 0.6165167
#>                         c   Entropy      color
#> ENSG00000000003 0.3435224 1.5846272   darkblue
#> ENSG00000000005 0.0000000 0.0000000 whitesmoke
#> ENSG00000000419 0.3340576 1.5849564   darkblue
#> ENSG00000000457 0.3268219 1.5844933    darkred
#> ENSG00000000460 0.3750591 1.5674206   darkblue
#> ENSG00000000938 0.3834833 0.9604651 whitesmoke

5.5.4 Output interpretation

  • max_counts The maximum normalized expression value (across your selected samples) for that feature.

  • comx comy
    The x– and y–coordinates used to place that point inside the triangle.

  • color
    Which of your provided colors was applied (one per sample), or whitesmoke for filtered points. |

#> R version 4.4.3 (2025-02-28)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ggplot2_3.5.2               airway_1.26.0              
#>  [3] SummarizedExperiment_1.36.0 Biobase_2.66.0             
#>  [5] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
#>  [7] IRanges_2.40.1              S4Vectors_0.44.0           
#>  [9] BiocGenerics_0.52.0         MatrixGenerics_1.18.1      
#> [11] matrixStats_1.5.0           knitr_1.50                 
#> [13] BiocStyle_2.34.0            dominatR_0.1.0             
#> 
#> loaded via a namespace (and not attached):
#>  [1] remotes_2.5.0           rlang_1.1.6             magrittr_2.0.3         
#>  [4] compiler_4.4.3          roxygen2_7.3.2          systemfonts_1.2.3      
#>  [7] vctrs_0.6.5             stringr_1.5.1           profvis_0.4.0          
#> [10] pkgconfig_2.0.3         crayon_1.5.3            fastmap_1.2.0          
#> [13] XVector_0.46.0          ellipsis_0.3.2          labeling_0.4.3         
#> [16] promises_1.3.3          rmarkdown_2.29          sessioninfo_1.2.3      
#> [19] tzdb_0.5.0              UCSC.utils_1.2.0        tinytex_0.57           
#> [22] purrr_1.0.4             xfun_0.52               zlibbioc_1.52.0        
#> [25] cachem_1.1.0            jsonlite_2.0.0          later_1.4.2            
#> [28] DelayedArray_0.32.0     tweenr_2.0.3            R6_2.6.1               
#> [31] bslib_0.9.0             stringi_1.8.7           RColorBrewer_1.1-3     
#> [34] pkgload_1.4.0           lubridate_1.9.4         jquerylib_0.1.4        
#> [37] Rcpp_1.1.0              bookdown_0.43           usethis_3.1.0          
#> [40] readr_2.1.5             httpuv_1.6.16           Matrix_1.7-3           
#> [43] timechange_0.3.0        tidyselect_1.2.1        rstudioapi_0.17.1      
#> [46] abind_1.4-8             yaml_2.3.10             miniUI_0.1.2           
#> [49] pkgbuild_1.4.8          lattice_0.22-7          tibble_3.3.0           
#> [52] shiny_1.11.1            withr_3.0.2             evaluate_1.0.4         
#> [55] desc_1.4.3              urlchecker_1.0.1        polyclip_1.10-7        
#> [58] xml2_1.3.8              pillar_1.11.0           BiocManager_1.30.26    
#> [61] generics_0.1.4          rprojroot_2.0.4         hms_1.1.3              
#> [64] scales_1.4.0            xtable_1.8-4            glue_1.8.0             
#> [67] tools_4.4.3             ggnewscale_0.5.2        forcats_1.0.0          
#> [70] fs_1.6.6                grid_4.4.3              tidyr_1.3.1            
#> [73] tidyverse_2.0.0         devtools_2.4.5          GenomeInfoDbData_1.2.13
#> [76] geomtextpath_0.1.5      ggforce_0.5.0           cli_3.6.5              
#> [79] textshaping_1.0.1       S4Arrays_1.6.0          dplyr_1.1.4            
#> [82] gtable_0.3.6            sass_0.4.10             digest_0.6.37          
#> [85] SparseArray_1.6.2       htmlwidgets_1.6.4       farver_2.1.2           
#> [88] memoise_2.0.1           htmltools_0.5.8.1       lifecycle_1.0.4        
#> [91] httr_1.4.7              mime_0.13               MASS_7.3-65